AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Vision-to-Text Conversion

# Vision-to-Text Conversion

Donut Base Finetuned Cord V2
Donut is a visual document understanding model based on Swin Transformer, specifically fine-tuned for the CORD dataset, capable of extracting structured text information from images.
Image-to-Text Transformers
D
Xenova
32
0
Donut Base Receipt V3
MIT
Receipt recognition model fine-tuned based on naver-clova-ix/donut-base
Large Language Model Transformers
D
hyunguk1
13
0
Image Caption
Apache-2.0
An image caption generation model based on the VisionEncoderDecoder architecture, capable of converting input images into natural language descriptions.
Image-to-Text Transformers
I
jaimin
14
2
Donut Base Finetuned Cord V1 2560
MIT
Donut is an OCR-free document understanding Transformer model that combines a visual encoder with a text decoder to achieve image-to-text conversion.
Image-to-Text Transformers
D
naver-clova-ix
30
1
Donut Base Finetuned Rvlcdip
MIT
Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder to process document images.
Image-to-Text Transformers
D
naver-clova-ix
125.36k
13
Donut Base
MIT
Donut is an OCR-free document understanding Transformer model composed of a visual encoder (Swin Transformer) and a text decoder (BART).
Image-to-Text Transformers
D
naver-clova-ix
50.34k
207
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase